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Abstract. The Lanczos process constructs a sequence of orthonormal vectors v m spanning a 
nested sequence of Krylov subspaces generated by a hermitian matrix A and some starting vector b. 
In this paper we show how to cheaply recover a secondary Lanczos process starting at an arbitrary 
Lanczos vector v m . This secondary process is then used to efficiently obtain computable error 
estimates and error bounds for the Lanczos approximations to the action of a rational matrix function 
on a vector. This includes, as a special case, the Lanczos approximation to the solution of a linear 
system Ax = b. Our approach uses the relation between the Lanczos process and quadrature as 
developed by Golub and Meurant. It is different from methods known so far because of its use of 
the secondary Lanczos process. With our approach, it is now in particular possible to efficiently 
obtain upper bounds for the error in the 2-norm, provided a lower bound on the smallest eigenvalue 
of A is known. This holds in particular for a large class of rational matrix functions including best 
rational approximations to the inverse square root and the sign function. We compare our approach 
to other existing error estimates and bounds known from the literature and include results of several 
numerical experiments. 
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1. Introduction. Our main interest in this paper is in error bounds and error 
estimates for Lanczos approximations of the action of a rational matrix function on a 
given vector. To set the stage, we consider in this introduction the most familiar case 
where the rational function is f(t) = t , i.e., we consider a linear system 

Ax = b (1.1) 

where A e C nx?l is hermitian positive definite (hpd), large and sparse. The method 
of choice to solve such a system is the conjugate gradient (CG) method of Hestenes 
and Sticfel |22j in which — given an initial guess Xq — the m-th iterate x m is taken from 
the affine Krylov subspace 

(A,r ), where K m (A,r ) = span{r , Ar , . . . , A m V } 

such that its residual r m = b—Ax m is orthogonal to K m (A, b). This Galerkin condition 
is equivalent to requiring that x m minimizes the A-norm of the error x^ Xwi. where 
x* = A~ 1 b, over all x e xq + K m (A,b). Algorithmically, CG is implemented using 
short recurrences which makes the method very efficient computationally. 

In order to obtain a stopping criterion for the CG iteration it is important to have 
some information on the error x^ Xjyi- A simple measure for the error is the norm 
of the residual r m = b — Ax m , since by the definition of the A 2 -norm 

II || = (A(x* X m ), A{x* X m )) = || X* X m || - 
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For linear systems, the residual is a quantity which is easily available. If the data in 
the matrix A is not known exactly, \\b — Ax\\ < e translates into ||6 — (A + AA)x\\ < 
e+ \ \AA\\ ■ \\x\\, which bounds the residual of the perturbed matrix A + A A, provided 
we know a bound on || A^4|| . This shows that the residual is a convenient error measure 
particularly when we have inaccurate initial data. 

The energy norm, i.e., the ^4-norm, of the error, ||a;* — x m ||A = (x* — x m , A{x* — 
x-m)) 1 ^ 2 is often the most natural measure for the error, since it relates to physically 
meaningful quantities in many applications. Also, the 2-norm of the error, (a;* — 
x mi x* — Xm) 1 / 2 , is of interest as an operator independent measure for the error, 
particularly in connection with rational matrix functions. 

For any z £ C™ we have 

\\ Z \\ A 2 1/2 \\ Z \\ A 2 /2 

A mm S .—y S A max ana A min S "j^jj^" - A max> 

where A m ; n and A max denote the smallest and largest eigenvalues of A, respectively. 
Hence we have, for example, 

IMIa<-r^-|klU', \\z\\a<-^=\\z\\ A 2 

^min Win 

but the factors and -^== represent only a worst case bound; for a given z the 
ratio of the norms can be substantially smaller. Moreover, the extremal eigenvalues 
or bounds for them arc not necessarily available. 

In [571 HI], see also [TH] it was shown that one can enhance the CG iteration 
at very low computational cost to obtain, in addition to the iterates, estimates and 
bounds for the error of the current iterate in a retrospective manner: For a given small 
positive integer k, error estimates for the iterate at step m can be determined at step 
to + k (^4-norm and 2-norm) ; see also (38J 39 for an estimate for the 2-norm obtained 
at iteration to + 2k. These estimates become more and more precise as k increases. 
While for the A-norm one can obtain lower and upper bounds in this manner, one gets 
only a lower bound in the case of the 2-norm. We will give more details in section [5j 

These error estimates and bounds rely on an elegant theory relating an integral 
representation of the error norms with orthogonal polynomials, Gaussian quadrature 
rules and the Lanczos process, see [17j [18] and the book [19]. In the present paper 
we propose to use this theory in a different manner to be able to determine lower 
and also upper bounds for the error in the 2-norm. Actually, instead of dealing with 
the CG iteration for a linear system, our focus will be, more generally, on Lanczos 
approximations to the action f{A)b of a rational matrix function / on a vector b, 
in this manner continuing the work from |15) . In the matrix function case, there 
is no natural and easily accessible "residual", and, as opposed to the linear system 
case, the A-norm is not a "natural" , physically motivated measure for the error any 
more. We therefore focus on the 2-norm. Note that upper bounds for the error are 
particularly useful, since a stopping criterion based on the upper bound being less 
than a prescribed threshold guarantees that the actual error is indeed less than this 
threshold. 

2. Lanczos process and Lanczos approximations. In this section we recall 
the Lanczos process (cf. [19] or [37]) and the related Lanczos approximations to vectors 
of the form f(A)b, with / a function defined on the positive real axis and b 6 C". 
Note that for / : t —> t^ 1 the vector f(A)b is the solution of the linear system A~ x b. 
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Assuming that V\ £ C™ is normalized to || f 1 1| 2 = 1, the Lanczos process computes 
orthonormal vectors V\ , i>2, • • . such that vi , . . . , v m form an orthonormal basis of the 
nested sequence of Krylov subspaces K m (A 7 vi) 7 m = 1,2,.... Algorithmically, v m+ \ 
is obtained by orthogonalizing Av m against all previous vectors. Since A is hermitian, 
it is actually sufficient to orthogonalize against v m and v m -\, see Algorithm 12.11 



Algorithm 2.1: Lanczos process 
1 choose Vi such that ||ui|| = 1 
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The Lanczos process can be summarized via the Lanczos relation 

AV m = V m+ iT m = V m T m + j3 m ■ Vm+ie^, (2.1) 

where V m — [t>i| . . . \v m ] € C" xm is the matrix containing the Lanczos vectors, e TO = 
(0,...,0, 1) H G C m and 

T — 

m '■ '■ 8 

Hrn — 

$rn — 1 

Pm 

with T m a (real) symmetric tridiagonal matrix. 

Throughout the whole paper we will use the notation ej to denote the j-th canon- 
ical unit vector from C e , where we explicitly mention the dimension I of the space 
when necessary. We just used e m £ C m , e m = (0, . . . , 0, 1) H , and we will often use 
ei G C m ,ei = (1,0, ...,0) H etc. For case of terminology, we will also call T m a 
tridiagonal matrix, although it is not square. 

The following two basic properties of the Lanczos process will be important for 
this paper. 

Lemma 2.1. 

(i) Shift invariance |33j : Let a £ C and put A a — A — a I . Assume that we start 
the Lanczos process for A a with the same initial vector v" = v\ as for the 
Lanczos process for A. Then the matrices V° % ,T m of the Lanczos relation 
i2.1\) for A a , starting with v° , are given by 



p (m+1) xm 



••• 
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(ii) Essential uniqueness of the Lanczos relation: Assume that we have V m +i = 
[V m | v m+1 ] G C" x(m+1 ) with orthonormal columns, andT m € C( m+1 ' xm 
tridiagonal with positive off-diagonal entries, satisfying 

AV m = V m+1 T m . (2.2) 

Then 12.2]) is the Lanczos relation for the matrix A with starting vector V\ , 
i.e., the columns of V m -\-\ are the Lanczos vectors and the entries ofT m the 
corresponding coefficients. 
Proof. The first result follows directly by inspection of the Lanczos process, 
Algorithm 12.11 For part (ii) we note that (|2.2[) , together with the assumption that 
V rn+ \ has orthonormal columns and that T m is tridiagonal, already implies that for 
j = 1, ... ,771 the vector Vj+\ is a positive scalar multiple of the vector Wj which we 
obtain from orthogonalizing Avj against Vj and Vj-±. Since = 1, the scalar 

factor must be l/||tyj||, which is exactly how the Lanczos process proceeds. □ 

The m-th Lanczos approximation x m to the action f{A)b of a matrix function 
f{A) on a vector b is given as 

x m = V m f{V*AV m )V%b = \\b\\ ■V m f{T m )e 1 , 

where the Lanczos process is started with v\ = (l/||6||)-6. The Lanczos approximation 
is motivated by the fact that it is equivalent to setting x m = q m -\(A)b, where q m —i is 
the polynomial of degree m — 1 which interpolates / in the eigenvalues of T m , i.e., the 
Ritz values of A with respect to the subspace K m (A, b). For details, cf. [HI |2"51 15r?l W2\ . 

In the case of a linear system Ax = b we want to compute i.e., we have 

f(t) = t . The m-th Lanczos approximation x m is then given as 

X m = ll&H • VmT-W (2.3) 

This is equivalent to the Galerkin condition b—Ax m -L K m (A, b) with x m € K m (A, b). 
Indeed, if we put x m = V m y m we see that b — AV m y m -L Km{A, b) iff y m solves 

V^(b-AV m y m )=0, 

wherein V^b = \\b\\ei and, due to (|2.ip . AV m = T m . The Lanczos approximation 
x m is thus mathematically equivalent to the m-th iterate of the CG method with initial 
guess xq — 0. Note that if one wants to use an initial guess xq ^ 0, CG iteratively 
obtains corrections to xq which are the Lanczos approximations for A~ 1 b — xq = 
A _1 ro,r = b- Ax . 

The residuals of the CG iterates are related to the Lanczos vectors as stated in 
the following lemma, cf. [3"2"j . 

Lemma 2.2. Let x rn be the m-th CG iterate and r m = b — Ax m its residual. 
Moreover, let w m +i be the m + 1-st Lanczos vector, where the Lanczos process is 
started with v± = (l/H^oH) ' r o- Then 

'm Pm ' ^m+l 

with 

Pm = -&mVm • \\b\\ • Pm, where y m = T~ 1 ei £ C m . 
Moreover, we have p m = (— l) m ||r m || . 

1 



Proof. All stated results can be found in [32] . As an indication for the reader we 
just give a short sketch for the representation of p m in the case Xq = 0: Using (|2.1[) 
the residual r m of the CG iterates x m from (|2.3[) are given as 

b - Ax m = b - \\b\\ ■ AV m T m 1 e 1 =b-\\b\ 
= \\b\\ ■ Vm+i (ei - T m T~ 1 ei) = 
= -ll&ll • Pm ■ (e^T^et) ■ v m+1 . 

□ 

There are various ways to cheaply update the Lanczos approximation x m from 
(|2.3[) to x m +i- The standard way is to update the (root-free) Cholesky factorization of 
T m to one of T m+ i, thus arriving at the familiar coupled two-term recurrence of the CG 
algorithm; see [37], e.g. Another possibility is to use the fact that v m = p m (A)b where 
p m is the characteristic polynomial of T m . The Lanczos relation ()2.1j) gives a three- 
term recurrence for p m . By Lemma l2.21 we have r rn = p m {A)b with p m (t) = p m Pm(t), 
Pm = l/p, n (0). Since x rn = q m -i(A)b with p m (t) = 1 — tq m -i{t), the recurrence for 
the p m implies one for the iterates x m . Note that p m (0) ^ 0, since the zeros oip m are 
the eigenvalues of T m and thus contained in [A m i n , A max ]. We refer to [37] for a more 
detailed description of this approach. For future reference, this three-term recurrence 
variant of the CG method is given in Algorithm 12.21 



• Vm+iT m T m ei 

m ' Vm+1 ( ei "(«T-0 ei ) 



Algorithm 2.2: CG Lanczos (initial guess is zero) 



1 set x-i = 0, po = ||b||, T = 1, vi = (l/po)b 

2 for j = 0,1,... do 



compute ttj+i, Pj+i, v j+2 using the Lanczos process for A 
if j > then 

tj p) i 



T 

end 

Pj+i 

Xj+l 



1 



■j-i 'i- 



-TjPi^- 
J I J a j+ i 

T j( x 3 + ^TT r j) + C 1 - 7 'j)*3-l 



r J + l = Pj+l V j+2 



io end 



In our context, the major advantage of Algorithm l2.2l is that it easily also produces 
the Lanczos approximations for systems of the form (A — o~I)x = b if a [A m i n , A max ] 
and thus, in particular, if a is not real. Indeed, as was observed in [11] [13], e.g., due 
to Lemma 12.11 the characteristic polynomial p m for the shifted system is related to 
that of the non-shifted system via p m {t) = p m (t — a). Since p m (0) = p m {— c) 7^ 0, we 
see that all Lanczos approximations are well-defined and that we can work out the 
three term recurrence for the Lanczos approximations in exactly the same manner as 
in the case without the shift a. We refer to [10] to yet another breakdown free variant, 
based on a short recurrence update for QR- factorizations of the matrices T m ■ For the 
case of real shifts and the standard coupled two-term recurrence, see also [41] . 

Now, let / : t — > X)i=i ^> e a rational function with poles Ui outside the interval 
[Amin, A max ]- Complex poles Cj arise quite naturally in applications, such as rational 
approximations to the exponential function. Let v\ = (1/||6||) ■ b be the normalized 
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vector for b with which we start the Lanczos process. From the shift invariance, 
Lemma |2. If i). it follows that the m-th Lanczos approximation x m to f(A)b is given 



x m = \\b\\ ■ V m ■ {yy>i{T m - ajy'e^ . (2.4) 

This Lanczos approximation always exists, i.e., all matrices T m — tTj/ are non-singular, 
since the spectrum of T m is contained in [A m i n , A max ] and <jj g [A m i n , A max ]. From the 
shift invariance property and relation (|2.3[) . we see that the Lanczos iterate x m from 
()2.4|) is just the linear combination 

p 

x m = J2^ (2-5) 

i=l 

of the Lanczos approximations x m for the solutions of the systems (A — <JiT)x = b 
(with initial guess Xq = for all i). 

By the preceeding discussion, all Lanczos approximations can be obtained via 
Algorithm 12.21 and by Lemma I2.1f i) we get the same Lanczos vectors Vj , indepen- 
dently of the shift <7j. Hence we can modify Algorithm 12.21 to a multishift variant, 
where we perform lines 4 to 9 simultaneously for each shift cjj to obtain all p Lanczos 
approximations Xm (and their linear combination x m ) using short recurrences and 
just one matrix-vector multiplication per step. 

The task to which this paper is devoted is to obtain good error estimates for the 
Lanczos iterates for a single system Ax = b (see (|2.3p ). or the action of a rational 
matrix function f(A)b (the Lanczos iterates from (|2.4[0 . In the case of the CG iterates 
for the system Ax — b, we can express the error as 



A 1 r m with Tm = b — Ax 



n i * 



which, using Lemma 12.21 results in 

\\%* - ZmWl = \Pm\ 2 ■ v" +1 A~ 2 v m+1 , ||a:„ - x m \\ 2 A = \p m \ 2 ■ i^ +1 A -1 i> m +i- (2.6) 

For the Lanczos approximation (|2.4[) for a rational function we can apply Lemma 
to all systems (A — <Xj/)xW = b to see that we have 

r$ = b-(A- ail)x® = pflvm+i, 

so that it is possible to express the error f(A)b — x m = X)f=i w i(^ — b — x r , 

with x m from ()2.5j) as 

P P 
^UiiA-tnl)-^- WiX® =^2ui(A- ajy 1 (b-(A- a 4 /)x«J 

i=l i—1 

P 

= u *p$( A - pjy^m+i 



where 



g m (A)v m+ x, 



p (i) 

.(*) = Et3 =! - w 

i—l 



In the case of a rational function we can thus express the square of the 2-norm of the 
error as 

(g m (A)v m +i) (g m (A)v m+1 ) = v^ +1 h m (A)v m +i, 

where 

h m (t) = g m {t) ■ g m {t) = \g m (t)\ 2 . 

We have thus shown that the square of the 2-norm and of the A-norm of the 
error of the CG iterate (|2.3p as well as of the Lanczos approximations (|2.4p (only the 
2-norm) are given in the form 

v H h(A)v, 

where h is a known rational function defined on [A m i n , A max ] and v is the current (the 
m + 1-st) Lanczos vector. 

3. Error bounds and error estimates. In this section we summarize the key 
aspects of the theory relating moments, quadrature and orthogonal polynomials, see 
[T7l [T8l [19], which allows us to obtain estimates and often even lower and upper 
bounds for quantities of the form v H h(A)v and thus, in light of the discussion at 
the end of section [2j for the norm of the error of the Lanczos approximations (|2.3|) 
and (|2.4j) . The error estimates obtained rely on running a new Lanczos process, now 
starting with v. 

Let (Aj, Zi), i = 1, . . . , n denote the eigenpairs of A where the vectors z, are or- 
thonormal and Ai < A2 < . . . < A„. Expanding v in terms of the basis Zi we can 
write 

n 
t=l 

Since h(A)v = Yl7=i M^Ti^i) cf. [Ml [23], we have 

n „b 
v H h(A)v = Y, h(\i) ■ W 2 - / h(t) d 7 (t), (3.1) 
i=i Ja 

where [a, b] 2 [A m i n ,A max ], the integral is to be understood as a Riemann-Stieltjes 
integral and the discrete measure 7(f) is given as 

[ if t < A min 

7(*)H S-=il^l 2 if < t < A i+1 . 
I E"=il7,f if K<t 

We can now use Gauss, Gauss-Lobatto or Gauss-Radau quadrature rules to ap- 
proximate fv maJI h{t)drj{t). Algorithmically, evaluating these rules turns out to be 
very intimately related to the Lanczos process based on the starting vector v. The 
precise results are as follows, see [T71 [T51 [T^] . 

Theorem 3.1. Let Tk denote the tridiagonal matrix in the Lanczos relation 12. 1]) 
arising after k steps of the Lanczos process with starting vector v, \\v\\ = 1. Assume 
that h is at least 2k times continuously differentiable on an open set containing [a, b] . 
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(i) Approximating iS.lp with the Gauss quadrature rule using k nodes tj £ (a,b) 
gives 

v H h(A)v = ef MT/?)ei + R k [h], where T fe G = f k , 



with the error Rj? [h] given as 



R%{h] = 



fe( 2fc )(£) 
(2fc)! 



3=1 



dj(t), a <b . 



(3.2) 



(ii) Approximating US. 1\) with the Gauss-Radau quadrature rule using k—1 nodes 
tj G (a, b) with one additional node fixed at a gives 

v H h(A)v = e?h(T^)e 1+ R^lh}. 

Here, the tridiagonal matrix T)p R differs from Tk in that its (k, k) entry ak 
is replaced by ak = o, + 5 k -\, where Sk~i is the last entry of the vector 6 with 
(Tk-i — aI)S = k l _ l ek-i- The error i?j? R [/i] is given as 



B%*[h) 



(2k-l)\ 



(t-a) 



f[(t- tj ) 

3 = 1 



d-y{t), a<£<b. (3.3) 



(Hi) Approximating 113. 1\ ) with the Gauss-Lobatto quadrature rule using k — 2 nodes 
tj G (a, b) and two additional nodes, one fixed at a and one fixed at b, gives 

v H h{A)v^e^h{T^)e l +R^[h]. 



Here, the tridiagonal matrix Tj? L differs from T k in its last column and row. 
With 8 and /j, the solutions of the system (T k -i —al)5 = ek-i, (7fc_i —bl)fi = 
ek-i andotk,0k-i the solution of the linear system 



' 1 


-Sk ' 




Oik 




a 


1 


—fj-k 








b 



the tridiagonal matrix T)P L is obtained from T k by replacing a k by a k and 
j3 k —i by f3 k -i- The error R^ h [h] is given as 



R^[h] 



fo( 2fc - 2 >(g) 
(2k-2)\ 



(t-a)(t-b) [[(t-tj) d 7 (t), a<£<6. 

j'=i 

(3.4) 

Inspecting the quadrature error terms R^[h], i?^ R [/i] and Rf L [h], we get the 
following corollary which applies Theorem 13.11 to the rational functions h through 
which we expressed the error of the m-th Lanczos approximation as v^ l+1 h(A)v m+ i 
at the end of section [2] The corollary is thus the key to obtaining error bounds for 
the Lanczos approximations. 

Corollary 3.2. The estimates efh(T^)ei, ef h{T^ K )ei andefh(T^)ei from 
Theorem \3.1\ (i), (ii) and (Hi), resp., represent lower or upper bounds for h3.1\) if the 



derivatives h^ 2k Kh^ 2k 



id M 



2k-2) 



have constant sign on the interval [a,b]. 



This is true in particular for the rational functions h(t) = p 2 n t ,h(t) = p 2 m t 
from H2. 6\) as well as h(t) = with 



for which h^(t) > 0, /i^" 1 ) < for t G (0, oo) and k e N. 

Proof. The only non-trivial part of the corollary concerns the derivatives of h(t) = 
g^it). We first note that by Lemma [2T2l sign(pm) = (— l) m , independently of i. 
Thus, the derivatives of each of the summands of g m (t) have constant sign on [0, oo), 
resulting in 



we thus see that d l h m (t)/dt l < (> 0) for t £ [0, oo) if £ is odd (even). □ 

We just note that there is a connection to results from [8l [12] on the monotone 
convergence of the Lanczos approximations. 

For future reference we state the computational cost of the error estimates from 
Theorem 13. II for those functions h of interest in this paper. 

Lemma 3.3. Assume that T k is given. Let h(t) = t^ 1 or h(t) = t~ 2 or h(t) = 
g(t)g(t) with g(t) = X^Ll t-a- • Then the cost for evaluating the estimates from 
Theorem \3.1\ (i), (ii) and (Hi) is 0(k). 

Proof. Solving a linear system with a tridiagonal matrix of size k has cost O(k). 
Thus the cost for obtaining the matrices T GR and T GL from parts (ii) and (hi) is O(k). 
Denote by T k any of the matrices T^ G ,T^ GR and 1\ GL . For the case h(t) = \p m \ 2 ■ t' 1 
we have to solve the linear system T^y = e\ and to compute e^y which has cost O(k). 
Similarly, for h(t) = \p m \ 2 -t~ 2 we have to solve T k y = e\ and compute y H y, which has 
again cost Oik). Finally, for h m (t) = g m (t)g m (t) we have to solve (Tk — o~iI)y^ = e\ 
for i = l,...,p, compute y = Y^i=i u iV^ an d then y H y, which has total cost 0(pk) 
which is 0(k) if we consider p as fixed. □ 

It is important to note that if one considers evaluating the error estimates for a 
sequence of values for fc, most of the quantities needed can be obtained by an update 
from k to k + 1 with cost 0(1) only. For details we refer to [TH], e.g. 

4. Lanczos restart recovery. We want to use the results of Theorem 13.11 to 
obtain bounds or estimates for the error of the iterate x m of the CG iterate (|2.3[) or 
the Lanczos approximation for a rational function (|2.4[) . To avoid ambiguities, let us 
call the Lanczos process via which the iterates obtained the primary Lanczos 

process. The straightforward way to obtain the error estimates from Theorem 13.11 
would be to perform k steps of a new, restarted Lanczos process which takes the 
current Lanczos vector v m +i of the primary process as its starting vector. This results 
in the restarted Lanczos relation 



AV k T - V k T +1 T T k , where V k T = [v\ \ . . . \ v\] , V k T +1 = [V k T | v\ +1 ] , v\ = v m+1 , (4.1) 





Using 



d l h m (t) 
dt e 




£ \ d?g m {t) d e -ig m (t) 
j ) dP ' dt e ~i 
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and we can now apply the theorem using the tridiagonal matrix Tf arising from the 
restarted process. This is, however, far too costly in practice: computing the error 
estimate would require fc multiplications with A — approximately the same amount of 
work that we would need to advance the primary iteration from step m to m + fc. 

Fortunately, as we will show now, it is possible to cheaply retrieve the matrix T£ 
of the secondary Lanczos process from the matrix T m+k+ i of the primary Lanczos 
process. This Lanczos restart recovery opens the way to efficiently obtain all the error 
estimates from Theorem 13. II in a retrospective manner: At iteration m + fc we get the 
estimates for the error at iteration m without using any matrix-vector multiplications 
with A and with cost 0(k 2 ), independently of the system size n. 

For m = 0, 1, . . . fixed, we define the tridiagonal matrix T 2k+ i as the block of 
T m +\+k ranging from rows and columns max{l, (m + 1) — fc} to (m + 1) + fc. This 
means that T 2k+1 is the trailing (2k + 1) x (2k + 1) diagonal sub-matrix of T m+k+ i, 
except for m + 1 < fc, where its size is (m + 1) + fc x (m + 1) + fc. 

The following theorem shows that for Lanczos restart recovery we basically have 
to run the Lanczos process for the tridiagonal matrix T 2k+ \, starting with the fc + 1-st 
unit vector ek+i 

G C 2fc+1 . 

Theorem 4.1. Let the Lanczos relation for fc steps of the Lanczos process for 
T 2 k+\ with starting vector ek+i £ C 2fc+1 (e m +\ € (ryri+i+k if m -\-\<k) be given as 

T 2k+ iVk = Vk+ifk. (4.2) 

Then the matrix is identical to T k from the restarted Lanczos relation j4.1\ ), 

T\=%. (4.3) 

Proof. For notational simplicity, we only consider the case m + 1 > fc where T 2 ^ + i 
has its full size (2k + 1) x (2k + 1). Recall the Lanczos relation for m + k steps of the 
primary Lanczos process given in (|2.1I) . 

AVm+k = V m +k+lT m +k. 

Since v m+1 € L< m (A,v 1 ) we have K k+1 (A,v m+1 ) C K m+k+1 (A,v 1 ). Hence we 
can express the vectors , i = 1, . . . , k + 1 of the restarted Lanczos process, see (|4.1[) . 
in terms of a basis of K m+ k+i(A, v\). The columns of V m +k+i form such a basis, i.e., 
we have 

vj = V m+k+1 q t , q t e C m+k+1 ,i = 1, . . . , k + 1. 

The vectors qi are orthonormal, since the vectors vj and the columns of V m +k+i are 
orthonormal, too. Putting Qi = [q± | . . . | <&] G £i m + k + 1 ) Xi we thus have 

= V rn+ k+iQi, i = l,...,k, 
so that the restarted Lanczos relation (|4.ip can be written as 

AV m+k+1 Q k = V m+k+1 Q k+1 T 1 k . (4.4) 

All columns of V m + k +iQ k = V k are from K k (A, u m +i) C K m+k (A, vi), so the columns 
of AV m+k+ iQ k are all from K m+k+1 (A, ui), on which the projector V m+k+1 V^ +k+1 
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T m +k+l Qk Qk+1 



Fig. 4.1. Illustration for the proof of Theorem \4-- 1\ Dark grey: non-zero entries; light grey and 
indices in the middle: restarted Lanczos; indices on the left: primary Lanczos. 



acts as the identity. From f|4.4[) we therefore get 

Vm+k+l V^ +k+1 AV m +k+l Qk = V m+ k+lQk+lT k , 

» , ' 

=T m+k + 1 by £0} 

and since Kn+fe+i has full column rank we have 

— r 

T m +k+iQk = Qk+\T k . (4.5) 

The matrix Qk+i has a special sparsity pattern: Since v\ = v m+ i, we have 
Qi = e m +i, the m+l-st unit vector in The matrix T m+ k+i being tridiagonal, 

a trivial induction shows that qi holds non-zeros only in those components j for which 
\j — (to + 1)| < i, see also Figure |4~T1 Consequently, Qk has non-zeros only in rows 
m — k + 2 to m + k, and Qk+i only in rows to — fc + 1 to m + fe + 1. Defining Vj, and 
Vfe+i as the matrices consisting of the last 2k + 1 rows (rows to — k + 1 to m + fc + 1) 
of Qk and Q/c+i, respectively, we see that Vk+i is identical to [Vk \ ik+i], that it has 
orthonormal columns and that -Oi = ek+i, the fc + l-st unit vector in <C 2k+1 . Moreover, 
with T 2 fc+i as defined in the theorem, we obtain from (|4.5[) 

T2k+iVk = Vk+iT k . 

Due to the essential uniqueness of the Lanczos relation, Lemma I2.1f ii) , this finishes 
our proof. □ 

The above theorem shows that we can retrieve T k from T m +k+i by performing 
k steps of the Lanczos process for the (2k + 1) x (2k + 1) tridiagonal matrix T2k+i- 
Here each step needs 0(k) operation^, so that the overall cost for computing T k is 
0(k 2 ). Together with Lemma \3. 31 we conclude that the total cost for computing the 
error estimates from Theorem 13. II is also 0(k 2 ) . 

Algorithm l4.1l shows how we suggest to use the results exposed so far. It computes 
the Lanczos approximations x m for g(A)b with g(t) = t-a- anc ^ ^ ne es timates 

Since Vj is non-zero only in positions k + 1 — (j — 1), . . . , k + 1 + (j — 1), step j actually has only 
cost O(j). This refined analysis does, however, not affect the ©-analysis of the total cost 
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£ m -h, u m -k for the error at iteration m — k based on the Gauss and the Gauss-Radau 
rule. By Corollarv l3.2[ these estimates represent lower and upper bounds, respectively, 
if all poles cr, arc negative and u>i > for all i. The algorithm can be modified to also 
obtain error estimates or bounds based on the Gauss-Lobatto rule and to get bounds 
for the A-norm in case we deal with a linear system. 



Algorithm 4.1: Lanczos approximations for rational function with 2- norm 
error estimates/bounds 



1 set z_i = 0, po = \\b\\, r = 1 

2 choose k 

3 for m = 0, 1, . . . do 



10 

n 

12 
13 
14 



15 
16 
17 



compute a m+ i, /3 m +i, v m +2 using the Lanczos process for A 
for i = l,...,p do /* loop over poles */ 

if m > then 

-l 



1 



end 



y m+l 



' m 1 ^ Tn 



«m + l- 



1 



end 

Ep w 

if m > k then 

perform fc steps of the Lanczos process for the trailing 
(2k + 1) x (2k + 1) diagonal sub-matrix of T m+ i, this yields the 



tridiagonal matrix T£ € 
tm-k = ||5m(7fc)ei|| 2 

Wm-fe = ||5m('7 1 fc ;R )ei||2 



ikxk 



/* g m is given in ( 12 . 7D */ 
/* f^ R given in Theorem l3TT1 (ii) */ 



end 



is end 



5. Comparison with existing methods. Let us first consider a single linear 
system Ax = b. If we solve this system via the CG method, we (implicitly) perform 
a "primary" Lanczos process. Assume that n iterations give the exact solution x n = 
H&H ■V n T~ 1 ei, see (|2.3[) . Then the A-norm of the error of the m-th iterate, x n ||A; 
can be expressed as 

\\x rn - x n \\\ = \\b\\* • (ef T~ 1 e 1 ef T^) . 

The matrix T m is available, in principle, from the CG iteration. Its Cholesky factor- 
ization can be updated easily from one step to the next. Also, it can be shown that 
ef / T~ 1 ei is a positive number which increases montonically with m. This implies 
that r\k,m '■= \\b\\ 2 ■ (e^T^_ k ei — e^T^ei) is a lower bound for \\x m — x n \\\ for any 
k > 0. The challenge is to obtain a numerically stable way to update r]k,m as the 
CG iteration proceeds. Starting with [H [5], many papers have been devoted to this 
topic, see [HJ HOI HZl HH I2H ESI ES] , summarized in Golub's and Mcurant's book [T5] . 
In order to also obtain upper bounds for the A-norm of the error — provided bounds 
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on the spectrum of A are known — the approach sketched so far can be extended to 
include Gauss- Radau and Gauss-Lobatto type estimates by (implicitly) using the ma- 
trices I\ GR and T^ L defined in Theorem QD] Meurant's CGQL algorithm (CG with 
Lanczos quadrature) from [27] (see also [19] and [30]) does so and thus computes 
upper and lower bounds for the ^4-norm of the error along with the CG iterates. As 
was already noted in [22] . r\u im can be computed using just the CG coefficients of it- 
erations m to m + k. This is based on the fact that the CG- algorithm in its standard 
form updates the iterates as x m +i = x rn + jmPm, where the search directions are A- 
orthogonal so that \\x m+k - x m \\ 2 A = J2i=o \l™+i\ 2 ■ ||p m +i|li- The CGLQ algorithm 
[27] implements this approach, and the analysis by Strakos and Tichy [38j [39] shows 
that this approach represents to date the most stable computation of r)k m , because 
problems due to loss of orthogonality in the primary Lanczos process are eliminated. 
Recently, the paper |30) shows how to transport this approach to also obtain upper 
bounds on the A-norm of the error. In all these approaches, the additional cost for 
getting the estimates and bounds is 0(k) per iteration. We note that the approach 
presented in the present work also respects the philosophy, motivated in [39] ■ e.g., 
to rely on local orthogonality only. Indeed, we just work with the quantities from 
iterations m — k, . . . , m + k when computing the error estimate for iteration m. 
For the 2-norm of the error one can use the relation, cf. P3B Corollary 21.7], 



and proceed in a similar manner as before. This is explained in detail in |19| : the 
resulting estimate is not necessarily a lower bound any more. Another estimate for the 
2-norm of the error was proposed in [35]. It involves the CG coefficients of iterations 
m to m + 2k to obtain an estimate for the error at iteration m and it was shown there 
that this estimate is actually a lower bound. Note that none of the approaches for the 
2-norm estimates has a "Gauss-Radau" counterpart which would allow for estimates 
that represent upper bounds. 

For the case of a rational matrix function g(A)b, with iterates obtained via the 
Lanczos approximation (|2.4[) , the paper fJ5] extends the approach of (35J to get es- 
timates for the 2-norm of the error. If all poles cr^ arc negative and all coefficients 
tUi are positive, these estimates were proven to be lower bounds in |15) . If there are 
complex poles, the estimates in [T5] were derived for the CG method based on the 
bilinear form (x, y) = y T x rather than y H x. Again, there is no variant which would 
compute upper bounds for the error in the 2-norm. Therefore, in j!6j . a different 
approach was used to get an upper bound: Using a global optimization algorithm, 
the maximum c rn = max tg [ Amin Amax ] |<7 m (i)| for g from (|2.7[) is bounded from above 
by c m which then is a bound for the error in the 2-norm. While one succeeds in 
getting upper bounds for the error with this approach, it is quite costly due to the 
global minimization, and the upper bounds are not necessarily very precise. 

As an alternative to the quadrature approach for the case of the CG iteration, 
the paper [3J suggests to use vector extrapolation on the current residual r and Ar to 
obtain error estimates. They can be modified to yield bounds, provided the smallest 
and largest eigenvalue of A are known. The computation of Ar can be avoided by 
using quantities available from the Lanczos process, but the extrapolation requires 
inner products with full vectors. Hence the cost for the error estimates of [3J is 0(n). 

6. Numerical experiments. In this section we illustrate the quality of the error 
bounds developed in this paper for several rational functions arising from applications. 
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Fig. 6.1. Error bounds and exact error for the CG iterates for Ax = b, A matrix slrmq4ml for 
k = 10. Left column: no preconditioning, right column: incomplete Cholesky preconditioning. Top 
row: Algorithm^. 1\ bottom row: comparison with other methods. 



All experiments were carried out on a standard workstation using Matlab R2011a. 

As a first example, we look at the CG iterates for a linear system Ax = b, 
where A is the matrix slrmq4ml from the matrix group Cylshell of the University 
of Florida matrix collection, see [6j [7] . It arises from a finite element discretization 
of a cylindrical shell. We chose this example because of its relatively high condition 
number which is of the order of 10 6 , despite its small size (n = 5, 489). The solution 
x was generated randomly, then explicitly computing b = Ax as the right hand side. 
We do not a priori know a lower bound a on the spectrum. We thus monitored the 
smallest eigenvalue of the tridagonal matrix T m which is known to converge to A m j n 
from above. As soon as for some iteration, fco say, the relative change in the smallest 
eigenvalue A was less than 10 -4 , we put a = 0.99 • A. Hence, in principle, we obtain 
upper bounds only for iterations beyond /co- To have the complete picture, though, 
we afterwards added the upper bounds obtained with this value for a for iterations 
1 to fco — 1- For the unpreconditioned system (left column of Figure 16. ip we see 
that the CG method converges very slowly. Even with k = 10, the error bounds 
obtained with Algorithm 14. 1 1 are quite severe over- and underestimations of the exact 
error. The second row gives error estimates obtained with other methods. On the 
one hand, we used the method from |38j in which we chose the parameter k = 5, 
meaning that we use information from the next 2k = 10 CG iterations, just as we do 
in Algorithm 14. II with k = 10. On the other hand, we also tested the recommended, 
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third extrapolation based method from [3] . This method can be modified to also give 
error bounds, provided we know the extremal eigenvalues of A, and the results for 
these bounds are also given. We can see that the bounds and estimates obtained with 
the extrapolation based methods are worse than the bounds obtained via the new 
approach. The same holds for the other extrapolation based methods from [3], for 
which we do not reproduce the results here. The estimate from |38j , which actually is 
a lower bound in this case, is comparable to and slightly more accurate than the lower 
bound obtained with Algorithm 14.11 The right column of Figure 16.11 shows that we 
get much faster convergence and much better error bounds when we use a standard, 
0-fill incomplete Cholesky factorization of A as a preconditioncr. This means that we 
perform Algorithm 14.11 for the matrix L" 1 AL~~ H instead of A and L~ x b instead of 
b, where A = LL H + R is the incomplete Cholesky factorization of A. The bound a 
for the smallest eigenvalue of the preconditioned matrix was obtained as before. A 
comparison between the new approach (first row) and existing methods (second row) 
leads to similar conclusions as in the non-preconditioned case. We also included results 
obtained with the method from [29j for comparison. The error estimates obtained for 
comparable parameters (k = 10) appear quite similar to those obtained with our 
new method. For the non-preconditioned matrix the error estimates with the method 
from [53] were extremely small (e.g., 10~ 30 after 1000 iterations, and even less for 
later iterations), so that we did not reproduce them in the left column of Figure [6TTI 
We suspect that in this case the bad conditioning induces an instability which we 
could not avoid although we used the most stable, Q-R-based implementation of the 
method from [29] , 

It can be noticed that for the preconditioned system the upper bounds obtained 
with Algorithm [4J] arc not as close to the exact error than the lower bounds, whereas 
it is the other way around for the non-preconditioned system. As a rule, we observed 
that for better conditioned systems, the lower bounds tend to be closer than the upper 
bounds, even if we work with very accurate estimates a for the smallest eigenvalue. 
A theoretical justification for this behavior is still missing. 

Our second example deals with rational approximations to the sign function, as it 
is used within the Neuberger overlap operator in lattice QCD. QCD (quantum chro- 
modynamics) is the physical theory of quarks and gluons as the constituents of matter. 
To evaluate this theory non-perturbatively, one has to work with discretizations on a 
4-dimcnsional space-time lattice amongst which the Wilson fermion matrix I—nDy/ is 
the most important one. Dw describes a nearest neighbor coupling on an cquispaced 
4d grid where each grid point holds 12 variables. Recent progress aiming at preserving 
the physically important "chiral symmetry" (cf. [2]) on the lattice lead to Neuberger's 
overlap operator Dn which has the form P + sign(PZ?w), P a simple unitary matrix. 
The matrix PDw is hermitian and indefinite. In order to solve systems with Dm one 
uses a Krylov subspace method, so that in each step one has to compute sign(PDw)b 
for some vector b. The matrix Q := PDyy is hermitian and indefinite. We report on 
numerical results obtained with the matrix D available in the matrix group QCD at 
the UFL sparse matrix collection as configuration conf 5 . 4-0018x8-2000 .mtx with 
k c = 0.15717. Q is then given as Q = P(I — |k c £)), with P the permutation 
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The dimension of the system is n = 12 • 8 4 ~ 50 000. We compute sign(Q)b for 
a randomly generated vector b. To this purpose, we first compute two numbers 
< ai < a2 such that spcc(<5) C [—02, — a{\ U [ai,a2]. We then approximate sign(t) 
on [— 02, — ai] U [ai, 02] with the Zolotarcv rational approximation, see [34]. It has the 
form g(t) — $3i=i ' a i > an d it is an ^-best approximation. The number 

of poles s was chosen such that the ^-error was less than 10 -7 , that is s = 11. To 
compute g(<3)6, we actually computed g(Q) ■ (Qb) with 



Since Q 2 is hermitian and positive definite, Algorithm 14.11 will produce lower and 
upper bounds for the exact error. In our computations we used a deflation technique 
common in realistic QCD computations [40]: We precompute the first, Ai,...,A g 
say, eigenvalues of smallest modulus. With II denoting the orthogonal projection 
onto the space spanned by the corresponding eigenvectors, we then have sign(Q)6 = 
sign(Q)(7 — U)b + sign(Q)(II6). Here we know sign(Q)(n&) explicitly, so that we 
now just have to approximate sign(Q)(7 — 11)6. In this manner, we effectively shrink 
the eigenvalue intervals for Q, so that we need fewer poles for an accurate Zolotarev 
approximation and, in addition, the linear systems to be solved converge more rapidly. 
In QCD practice, this approach results in a major speedup, since sign(Q)6 must 
usually be computed repeatedly for various vectors b. For Algorithm 14.11 it has the 
additional advantage that we immediately have a very good value for a, the lower 
bound on the smallest eigenvalue of Q 2 for which we can take X 2 . 

Figure 16.21 shows the results that we obtain deflating q — 30 eigenvalues. The 
(effective) condition number of the (deflated) matrix Q 2 is approximately 50, 000. As 
in our first example, the top row reports upper and lower bounds from Algorithm 14. II 
whereas the bottom row gives the estimates from |15j which we know to be a lower 
bound in this case. As before, we see that going from k = 2 to 10 results in a 
significant gain in accuracy. 

Figure [6T31 gives the results for Algorithm 14. II with k = 10 for a configuration on 
a 16 4 lattice, resulting in a matrix Q of dimension ss 800, 000. We again deflated 
the 30 smallest eigenvalues. The condition number of the deflated matrix Q 2 is now 
approximately 3,700, i.e., less than for the 8 4 lattice. Therefore, the convergence 
speed as well as the quality of the bounds are better than for the 8 4 lattice. 

As a last example we consider the [10/10] Pade approximation to the exponential 
function. Using [m/m] Pade approximations is very common for approximating the 
matrix exponential; cf. [T][2S][2T]. Matlab's expm uses Pade approximations along 
with the scaling and squaring approach [25]. The partial fraction expansion of the 
[10/10] Pade approximation to the exponential has the form 



where all the coefficients u% and poles <x; are non-real. We want to compute (I+g(A))b, 
so we focus on g{A)b. Due to the complex poles and coefficients we cannot easily 
obtain information on the sign of the derivatives of g m , implying that this time we do 
not know whether Algorithm 14. II really obtains bounds for the error. 

Figure 16.41 reports the results that we get when computing g(A)b with A the 
negative discrete Laplacian on a 200 x 200 grid, b a random vector. Computing 




(6.1) 
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exp(A)b for the negative discrete Laplacian A (or a scalar multiple thereof) is a 
common task when using exponential integrators in semi-discrctizcd parabolic partial 
differential equations, see |21) . 

For this example, taking k = 2 in Algorithm 14.11 is already sufficient to obtain 
error estimates which are very close to the exact error. Although we do not have 
a theoretical justification, the error estimates produced by Algorithm 14.11 turn out 
to indeed represent (tight) lower and upper bounds for the error. The right part 
of Figure 16.41 shows the error and the error estimates obtained with the approach 
suggested in [TS]. Note that due to the complex shifts, this approach amounts to 
perform a variant of the CG method which uses the indefinite bilinear form (x, u)t = 
y T x on C™. This method thus does not obtain the same iterates as Algorithm 14.11 
but we see that the norm of the error is quite similar for both approaches. The error 
estimate from [15] is much less precise for the first half of the iterations, whereas for 
the second half of the iterations it is comparable to the estimates from Algorithm 14. II 

The matrix exponential is also used in the analysis of large graphs like those de- 
scribing social networks. If A is the adjacency matrix of such an undirected graph, 
then exp(A)ij denotes the communicability (see [9]) between nodes i and j. Accord- 
ingly, exp(A)ei gets us the communicabilities of node i with all other nodes. For our 
numerical experiments we took i = 1 and we used the graph dblp-2010 from group 
LAW of the University of Florida sparse matrix collection. It describes the co-author 
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Fig. 6.3. Error bounds and exact error for Zolotarev approximation for sign(Q) in lattice QCD, 
16 4 lattice, Algorithm ^. 1\ k = 10. 
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Fig. 6.4. Error bounds and exact error for Fade approximation to the exponential, A negative 
discrete Laplacian on 200 X 200 mesh. 

relation between all authors appearing in the DBLP database of journal papers in 
computer science as of some day in the year 2010. This graph has more than 300,000 
nodes and about 1.5 million vertices. Note that A is an indefinite matrix. For this 
matrix the error estimates from |15| could not be applied, since the use of the indefi- 
nite bilinear form produced breakdowns in the algorithm. In Figure 16.51 we give only 
one (the "lower bound" £ m ) of the estimates from Algorithm 14. II for k = 2 and k = 10. 
The "upper" bound u m behaves quite similarly (where we compute a as in the second 
example). For k = 2 the estimates appear to systematically represent a lower bound. 
For k = 10 we clearly see that the estimate does not represent an upper nor lower 
bound for the error, but we get an estimate for the error which is never more than a 
factor of 5 off the exact error. 

7. Conclusions. Building on the theory of Golub and Meurant we proposed a 
novel use of this theory which allows, in particular, to obtain estimates, lower and 
upper bounds for the 2-norm of the error of the action of a rational matrix function on 
a vector. Such estimates are important for rational matrix functions as they can be 
used as a stopping criterion. Upper bounds have the advantage to provide a reliable 
stopping criterion: If we stop the iteration once the upper bound is less than a given 
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Fig. 6.5. Error bounds and exact error for Pade approximation to the exponential, A adjacency 
matrix for dblp-2010 graph, k = 2 and k = 10 in Algorithm ^. 1\ 



threshold e, the exact error is also smaller than e. Our new approach relies on a 
secondary, restarted Lanczos process which can be obtained very efficiently at cost 
which is independent of the matrix size. Numerical examples show that the new 
approach can give very good error bounds with the quality of the bounds depending 
on the number k of steps in the secondary Lanczos process and on the condition of the 
matrix function. The effects of rounding errors were not studied, but our approach 
follows the philosophy put forward in [55J I3H] in that it only makes use of "local 
orthogonality" , the secondary Lanczos process involving just the last 2k iterations 
of the primary Lanczos process. Our approach can, in principle, be extended to the 
preconditioning idea from [24], where instead of f{A)b one computes r(rl— A)~ 1 b with 
r the rational function r(t) = /(r — t^ 1 ), see also [15j[35]. However, the conditions of 
Corollary [321 on the signs of the poles and the coefficients will usually not be fulfilled 
for r, so that we cannot expect to obtain lower and upper bounds. 
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